4 research outputs found
Smart Yoga Assistant: SVM-based Real-time Pose Detection and Correction System
SVM-based Real-time Pose Detection and Correction System refer to a computer system that uses machine learning techniques to detect and correct a person's yoga pose in real-time. This system can act as a virtual yoga assistant, helping people improve their yoga practice by providing immediate feedback on their form and helping to prevent injury. This paper presents a yoga tracker and correction system that uses computer vision and machine learning algorithms to track and correct yoga poses. The system comprises a camera and a computer vision module that captures images of the yoga practitioner and identifies the poses being performed. The machine learning module analyzes the images to provide feedback on the quality of the poses and recommends corrections to improve form and prevent injuries. This paper proposed a customized support vector machine (SVM) based real-time pose detection and correction system that suggests yoga practices based on specific health conditions or diseases. Paper aims to provide a reliable and accessible resource for individuals seeking to use yoga as a complementary approach to managing their health conditions. The system also includes a practitioner’s interface that enables practitioners to receive personalized recommendations for their yoga practice. The system is developed using Python and several open-source libraries, and was tested on a dataset of yoga poses. The hyper parameter gamma tuned to optimize the classification accuracy on our dataset produced 87% which is better than other approaches. The experiment results demonstrate the effectiveness of the system in tracking and correcting yoga poses, and its potential to enhance the quality of yoga practice
Personalization for BERT-based Discriminative Speech Recognition Rescoring
Recognition of personalized content remains a challenge in end-to-end speech
recognition. We explore three novel approaches that use personalized content in
a neural rescoring step to improve recognition: gazetteers, prompting, and a
cross-attention based encoder-decoder model. We use internal de-identified
en-US data from interactions with a virtual voice assistant supplemented with
personalized named entities to compare these approaches. On a test set with
personalized named entities, we show that each of these approaches improves
word error rate by over 10%, against a neural rescoring baseline. We also show
that on this test set, natural language prompts can improve word error rate by
7% without any training and with a marginal loss in generalization. Overall,
gazetteers were found to perform the best with a 10% improvement in word error
rate (WER), while also improving WER on a general test set by 1%
Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition
We propose a neural language modeling system based on low-rank adaptation
(LoRA) for speech recognition output rescoring. Although pretrained language
models (LMs) like BERT have shown superior performance in second-pass
rescoring, the high computational cost of scaling up the pretraining stage and
adapting the pretrained models to specific domains limit their practical use in
rescoring. Here we present a method based on low-rank decomposition to train a
rescoring BERT model and adapt it to new domains using only a fraction (0.08%)
of the pretrained parameters. These inserted matrices are optimized through a
discriminative training objective along with a correlation-based regularization
loss. The proposed low-rank adaptation Rescore-BERT (LoRB) architecture is
evaluated on LibriSpeech and internal datasets with decreased training times by
factors between 5.4 and 3.6.Comment: Accepted to IEEE ASRU 2023. Internal Review Approved. Revised 2nd
version with Andreas and Huck. The first version is in Sep 29th. 8 page